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ABSTRACT Recently, machine learning techniques have gained popularity for the 
medical diagnosis. Medical professionals use this approach to learn and detect the 
abnormalities of life-threatening chronic diseases. The increasing use of ML approaches 
may be due in part to better disease diagnosis enabled through improved symptom 
detection. The current study deployed different machine learning algorithms, including 
Decision Trees (DT), K-Nearest Neighbors (KNN), classifiers Multilayer Perceptron (MP), 
Support Vector Machines (SVM), and Random Forest (RF) for early predictions and 
symptoms of the disease. These models were capable of differentiating between benign 
and harmful cancer cells Benign tumours, which were non-cancerous and in most cases, 
non-lethal were mostly confined to the area from where they originated, however, it was 
observed that malignant cancer can start with abnormal cell growth in the human body. 
This abnormal cell growth can quickly spread to nearby tissues, which can cause 
infiltration of adjacent cells, resulting in a potentially fatal condition. Thereby, it was 
observed that Multilayer Perceptron (MLP) model provided the highest accuracy 
percentage of 86% when compared with all the other techniques in association with the 
accuracy rate of the models 


INDEX TERMS Decision Tree (DT), K-Nearest Neighbor (KNN), Logistic Regression 
(LR), Support Vector Machine (SVM), Machine Learning (ML), Multilayer Perceptron 
(MP), Random Forest (RF) 


I. INTRODUCTION 


Cancer is a complex disease, which is 
characterized by uncontrolled growth of 
abnormal tissue in the entire body. 
Normally, old or damaged cells are 
replaced by new and healthy cells to 
maintain a healthy functioning of body . 
Contrastingly, some damaged tissue 
incessantly grows and become a mass of 
tissues known as a tumour in the human 
body. There are two types of tumours, 
for instance, malignant and benign 


tissue. The current study focused on 
breast cancer tumours along with 
Machine Learning Algorithms (MLA). 
Primarily, the breast composed of two 
main types of tissue: glandular tissue and 
connective tissue. Glandular tissues are 
responsible for producing milk, whereas 
connective tissues provide structural 
support and shape the breast. Glandular 
tissues may convert into malignant 
tumours with the passage of time. Most 
breast cancers emanate in the cells of the 
lobules, the anatomical structures that 
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consist of milk-producing glands, or in 
the ducts embedded amid the breast 
tissue, which acts as the passageways 
that deliver milk from the lobules to the 
nipples. Breast stromal tissues, which 
are composed of fatty and fibrous 
connective tissues, can also cause breast 
cancer. The overall structural 
deformation of a woman's breast tissue 
caused by the presence of a malignant 
tumour is influenced by age-related 
changes in the amount of fatty and 
fibrous tissues in her body[1]. The death 
rate annually has significantly increased 
due to the rising cases of breast cancer. 
Recently, it has been observed that a 
massive death rate is due to breast 
cancer[2]. 


The majority of female patients normally 
do not have a good prognosis for it at the 
stage when it is eventually recognised 
for what it is—a cancer of breast tissue— 
which accounts for its high lethality./ Its 
high fatality rate is due to the unfortunate 
reality and less awareness regarding this 
disease, which has not only increased the 
poor-prognosis due to _ last-stage 
detection but it has also caused 
malignancy in breast tissues. Breast 
cancer usually develops by a genetic 
abnormality, or defect, in the inherent 
code. This inherited genetic abnormality 
can cause erroneous gene expression, 
which becomes a prime reason for its 
cause and development. Only 5—10% of 
the cancer cases were observed to be 
because of inherited abnormalities from 
your mother or father. Instead, 85—90% 
of breast cancer cases are due to genetic 
abnormalities, which arise with aging 


[3]. 


In order to improve different areas of 
treatment and raise patients' chances of 
survival, accurate cancer detection is 
important./ Accurate diagnosis of cancer 
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is essentially important to optimize 
various aspects of therapy and to 
increase the chances of survivability of 
patients. Many researchers have put 
forth various approaches for automatic 
cell classification to diagnose breast 
cancer in the recent years. Therefore, it 
was observed that Machine learning 
techniques stand out among them for the 
classification and prediction of breast 
cancer among female patients [4]. 
Machine learning methods were used to 
identify cancer and determine whether a 
tumour is present or not, it may be useful 
in the study of breast cancer. 
Additionally, these cancer tumors can be 
predicted using machine-learning 
techniques (MLAs). When using 
conventional methods of diagnosis, 
these cancer tumors might frequently go 
undetected for a long time [5]. Thereby, 
increasing the proportion of deaths 
brought by cancer. 


Machine learning techniques (MLs) 
have recently emerged as a highly 
relevant area in practical research, which 
are very constructive in the prompt 
diagnosis of breast cancer. Over the past 
ten years, using machine learning 
techniques for medical diagnosis has 
gained popularity. This increased use of 
ML techniques can be partly attributed to 
the fact that it made better disease 
identification and symptom detection 


[6]. 


The current study is divided into the 
following sub-sections. The reviews of 
literature is given in Section 2. 
Moreover, Section 3 explains the 
experiment performed and the adopted 
algorithms to show the obtained results. 
Furthermore, these results are elaborated 
in Section 4. Additionally, Sections 5 of 
the study dicusses the study results and 
concludes the research. 
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Il. LITERATURE REVIEW 


Machine Learing (ML) can be used to 
submit fresh diagnostic hypotheses, 
helping to create a more customized 
therapeutic proposal. Several different 
MLAs, such as Decision Trees, 
Multilayer Perceptrons, K-Nearest 
Neighbors, and Support Vector 
Machines (SVM) have been used in this 
study. The CSV file of the breast 
mammograms was first taken and 
classification models that had been 
trained to achieve the aforementioned 
objective were then used. In contrast to 
benign tumors, which are non-cancerous 
andnon-lethal and are confined to the 
area where they originate whereas 
malignant cancer starts with abnormal 
cell growth and the abnormal behavior of 
messages between cells [7]. There is a 
vast body of literature on this subject, 
however, In the current study aims to 
deploy a computational method that 
would predict breast cancer with of 
accuracy. 


On the Wisconsin datasets, practitioners 
analyzed the performance of four 
classifiers: Naive Bayes (NB), SVM, 
Decision Tree (DT), and K Nearest 
Neighbors (K-NN) [8]. SVM has been 
identified as the best amongst the others 
by achieving high accuracy percentage 
of 97.13% with the lowest error rate with 
respect to others' confusion matrices. It 
is important to consider this number 
because the conventional diagnostic 
methods have higher probability to occur 
errors, which would significantly impact 
the treatment strategies and staging of 
breast cancer. Currently, biopsy-proven 
breast malignancy is the most accurate 
method for diagnosing breast cancer, 
which also directly affects the medical 
professionals decisions . 


Haifeng developed a SVM-based 
ensembled model for the early prediction 
of cancer. The proposed ensembled 
model was made up of six different types 
of kernel functions and two different 
types of SVM structures, such as a-SVM 
and C-SVM [9]. In this study, two 
datasets from Wisconsin were used, 
namely WBC and WDBC, which were 
used to test the model (SEER). The 
intended model enhanced the diagnosis 
accuracy when compared to other 
researches using a single SVM. The 
major observed disadvantage of this 
strategy is that it requires a long training 
period and is computationally expensive. 
Resultantly, the validity became 
doubtful because patients possessing 
breast cancer were constantly at risk of 
the disease spreading. 


Several deep learning and data mining 
techniques were examined for 
veneration cancer by several 
practitioners [10]. However, according 
to previous researches only a few papers 
used genetics, , which claimed that 
imaging was used in majority of the 
publications. A variety of algorithms, 
such as CNN and Nave Bayes, were used 
in imaging techniques. However, 
Machine Learning Techniques (MLT), 
Decision Trees, SVM, and Random 
Forests (RF) were quite famous. 


Researchers validated and applied 
different neural networks (NNs) 
techniques , especially in early-stage 
cancer classification. They discovered 
that most NNs were capable of 
identifying cancerous cells and these 
cells are typically difficult to distinguish 
because they frequently resemble the 
Parr of healthy breast tissue. 
Furthermore, they illustrated that these 
cells must be observed for a specific 
amount of time by the experts to identify 
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their distinctive characteristics, 
including their structure and growth rate, 
which can differ from normal breast 
cells. The imaging method, however, 
required a lot of processing power to 
preprocess the images [11]. 


Sudarshan Nayak in his study used 3D 
images to illustrate Machine Learning 
Algorithms (MLA) to categorize breast 
cancer. Before classifying an abnormal 
mass in the breast parenchyma, many 
factors were taken into consideration. 
Based on his overall results, it was 
concluded that SVM is the best model 
for this purpose [12]. 


Similarly, , Youness Khoudfi and 
Mohamed Bahaj in a comparison of 
machine learning algorithms found that 
SVM is the best classifier, with an 
accuracy of 97.9%, when compared to 
K-NN, RF, and NB, which are based on 
multilayer perception with 5 layers and 
10 times cross-validation using MLP 
[13]. Moreover, Ahmed [14] prepared a 
method for estimating WBCD, which 
combined a clustering strategy with a 
potent probabilistic vector support 
machine, with a prediction rate of 
99.10% made by the SVM technique. 


A new approach was introduced by 
David A. [15], which employed linear 
discriminant analysis (LDA) to reduce 
the feature dimensionality and then 
deployed the new, reduced feature 
dataset with SVM. On the (BCWD) 
dataset, the practitioner applied five ML 
algorithms: (SVM), Logistic Regression 
(LR), Decision Tree (DT), Random 
Forest (RF), and KNN. A performance 
assessment and comparison of these 
various classifiers were done after 
getting the results. This study's main 
objective was to develop a confusion 
matrix to check the significance of 
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precision and accuracy [16]. All other 
classifiers outperformed the support 
vector (SV) machines, which had the 
highest accuracy of 97.2%. 


Utilizing the WEKA program from the 
Waikato Environment for Knowledge 
Analysis, Valentina Mikhailova 
compared the categorization algorithms. 
A number of 286 cases and 10 attributes 
made the dataset for this study. The J48, 
Nave Bayes, Random Forest (RF), MLP, 
K*, and SVM models were compared 
using various parameters. Metrics such 
as machine learning helped to assess the 
performance of the developed models 
[17]. The SVM algorithm and the J48 
model both offered the highest levels of 
accuracy, at 75.5% and 79.6%, 
respectively. 


To categorize a WBCD, the practitioner 
used naive Bayes, SVM, and Decision 
Trees (DT); support vector SVM, which 
produced the best results with an 
accuracy score of 96.99% [18]. Clinical 
information from medical intensive care 
units was used in this study. Machine 
learning techniques were used for the 
early detection of disease for the patients 
inside the hospital over the course of 24 
hours. The KNN and logistic regression 
produced the highest accuracy ratings 
when applied to training data [19]. 


The biggest problem with breast cancer, 
according to Nithya et al. [20], is 
classifying breast tumours because of the 
structural distortion these tumours cause. 
It is crucial to determine the type of 
tumour when one is considering the the 
correct form of prognosis because it 
determines the impact of tumour on the 
breast tissue. Computer-aided diagnosis 
(CAD) was used to check the 
significance of breast cancer tumours in 
the patient. Their main goal was to use 
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data mining technologies to improve 
breast cancer projection. The 
classification performance of many 
machine learning algorithms was 
enhanced using Bagging, Multiboot, 
Random Subspace, and Multilayer 
Perceptron. 


The classification of different patient 
groups into high-risk and low-risk 
patients were examined by Kourou et al. 
using Artificial Neural Networks 
(ANN), Decision Trees (DT), and SVM 
to present a model for cancer risks [21]. 


A study has been conducted using 
mammogram diagnosis on biopsy of 
breast cancer [22]. The study used 
traditional classification techniques such 
as LR, LDA, QDA, FR, and SVM in 
their practical work. Md. Milon Islam 
contrasts supervised machine learning 
algorithms like SVM, ANNs, and LR 
[23]. The UCI machine learning 
database, a well-known machine 
learning resource, is where the WBC 
dataset was taken from. The 
performance of the study was evaluated 
using the confusion matrix and 
correlational factors. Additionally, 
different approaches' receiver operating 
characteristic curves and precision-recall 
area under curves were assessed. The 
findings indicated that SVM had the 
highest accuracy among all applied 
algorithms, while ANNs have the 
highest values of accuracy, precision, 
and F1-score. 


Different versions of the DT algorithm 
for the diagnosis of cancer were 
employed using Mat lab, Python, and 


WEKA. The CART employed in Python 
gave 97.4% and 98.9% in terms of its 
accuracy and sensitivity, respectively, 
while in WEKA both DT algorithms 
achieved 95.3% and 95.3% accuracy, 
respectively [24] [25]. 


In the testing phase, NSVM, LPSVM, 
SSVM, and LPSVM all achieved the 
highest accuracy, sensitivity, and 
specificity, which were 96.5517%, 
98.2456%, 96.5517%, and 97.1429%, 
respectively. Yue et al. provided 
thorough evaluations of different models 
by using the standard WBCD dataset and 
Decision Tree (DT) methods, which 
were used to predict breast cancer. 
According to the practitioners by 
collaborating with two deep learning 
models the highest accuracy rate can be 
achieved. This architecture had a 
classification accuracy of 99.68%, but 
when combined with the clustering 
algorithm, the SVM method had a 
classification accuracy of 99.10%. They 
also studied the ensemble method, which 
employed voting to create the J48, SVM, 
and Naive Bayes models. With the 
ensemble method, an accuracy of 
97.13% was achieved, respectively [26]. 
Infrared imaging coupled with an agent 
previously administered to a patient can 
lead to a very accurate tumour detector, 
with a thermal sensitivity camera and 
model of the breast [27].In this research 
remote health care systems used 
technological paradigms and enablers to 
fulfill their needs of remote monitoring, 
remote aid, and research gaps, which 
were identified to stimulate the future 
research [28]. 
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TABLE I 
SUMMARY OF LITERATURE REVIEW 
Reference Model Method 
Support Vector Machine 
[8] (SVM), Decision Tree (C4.5), SVM improves breast cancer 
Naive Bayes (NB), and k diagnosis and treatment strategies. 
Nearest Neighbors (k-NN) 
Haifeng's ensemble model 
; demonstrated improved diagnosis 
[2] suppor yetar machine accuracy, but its long training time 
ensemble Algorithm i ; 
and computational expense raise 
concerns. 
Haifeng's ensemble model 
Deep learning and machine demansieated improved d ia gn Bets 
[10] : i accuracy, but its long training time 
learning algorithms : 
and computational expense raise 
concerns. 
Neural networks can identify 
Artificial Neural Network ce cells but require 
[11] (ANN) observation over time and significant 
processing power processing of image 
processing. 
Support Vector Machine 
(SVM), Random Forest, SVM is the best-supervised machine 
[12] Logistic Regression, Decision learning algorithm for breast cancer 
tree (C4.5), and K-Nearest classification. 
Neighbours (KNN) 
Random Forest, Naive Bayes, 
[13] Support Vector Machines SVM is the best classifier with 97.9% 
— SVM, and K-Nearest accuracy. 
Neighbors K-NN 
14] Support Vector Machine Clustering and SVM combined to 
= (SVM) predict WBCD with 99.10% accuracy. 
Pulse-Coupled Neural A new approach uses LDA to reduce 
[15] Networks (PCNN) and Deep feature dimensionality and applies 
= Convolutional Neural five ML algorithms to the BCWD 
Networks (CNN) dataset. 
Support Vector Machine 


[16] 


(SVM), Random Forest, 
Logistic Regression, Decision 
tree (C4.5), and K-Nearest 
Neighbours (KNN) 


Support vector machines had the 
highest accuracy (97.2%). 
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Reference Model Method 
Naive Bayes, J48, K*, Random 
[17] Forest, Multilayer Perceptron SVM and J48 models offer accuracy 
= (MLP) and Support Vector at 75.5% and 79.6% respectively. 
Machine (SVM) 
Naïve Bayes, Support vector 
[18] machines, Radial basis neural SVM produced the best results with 
networks, Decision trees J48, 96.99% accuracy. 
and simple CART 
Used data mining technologies to 
improve breast cancer prediction, 
; ; using bagging, multiboot, random 
[19,20] Randomized Controlled Trial subspace, and multilayer perceptron to 


(RCT) 


improve the classification 
performance of machine learning 
algorithms. 


HI. MATERIALS AND METHODS 


Mammography is a radiological method, 
which is used for screening breast cancer. 
Diagnostic mammography is a special kind 
of mammogram, which is used to detect 
abnormalities in females who have been 
diagnosed with having breast issues or 
cancer after the suggestion or advice of a 
medical professional. Women over the age 
of 45 are advised to undergo screening 
mammography to rule out any breast 
tumours or malignancies, but this 
diagnostic procedure in and of itself has 
risks related to radiation exposure for both 
young and malignant females. However, 
mammography only predicts 70% on a true 
positive scale many other unnecessary 
biopsies are performed to confirm the 
indication of this chronic disease. 


Recently, several Computer-Aided 
Diagnostics (CAD) approaches have been 
put forth to lessen the number of 
unnecessary screening biopsies. These 


systems have helped clinicians to choose 
between performing a screening biopsy for 
a suspicious patient on the behalf of a 
mammogram 


and may perform a 


comprehensive follow-up instead. Due to 
some important limitations, a biopsy is not 
usually used to diagnose breast masses. 


This data set can be used to classify benign 
or malignant)based on BI-RADS (Breast 
Imaging-Reporting and Data System) 
features and the patient's age, which is 
considered a benchmark classification for 
the breast cancer diagnoses. This system 
depicts the correlation with the likelihood 
of malignant breast cancer. Sensitivity and 
associated specificities can be determined 
by assuming that all instances with BI- 
RADS ratings greater than or equal to a 
specific value (ranging from 1 to 5) are 
malignant and all other cases are benign. 
These can demonstrate a CAD system's 
performance in comparison to radiologists. 
It can be noticed from previous literature 
that different practitioners use different 
methods. whereas the machine learning 
method was employed in the current study 
to check the significance of the dataset, 
which fills the gap. Previously, these type 
of models have not been applied to check 
the accuracy of cancer genome by any of 
the researcher but in this paper novel model 
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techniques have been used to aid the 
research in this field. 


A. DATA PREPARATION 


Data preprocessing is a tool that locates or 
eliminates outliers. Additionally, it 
eliminates self-contradiction. The dataset is 
usually reduced to just the sample code 
number. Its removal is justified by the fact 
that it does not affect illnesses. The dataset 
contains 16 missing values for traits. For 
this, the missing traits are replaced by the 
mean. A random selection process from the 
dataset has been employed to ensure that 
the data is distributed correctly. The dataset 
contains the following features such as id, 
diagnosis (M = malignant, B = benign), 
radius, texture, perimeter, area, 
smoothness, compactness, concavity, 
concave points, symmetry, and fractal 
dimension. Ten real-valued features are 
computed for each cell nucleus, namely 
mean, standard deviation for grey-scale 
values, and worst or largest of these 
features. The dataset can be understood by 
the description given in the Table II below. 


TABLE II 
DESCRIPTION OF DATASET 
Feature Role 
Dia ís the main feature of 
Enos classification 
Radiis mean of distances from the 
center 
Texture the standard deviation of gray- 
scale values 
Smoothness local variation in radius lengths 
Compactness perimeter^2 / area - 1.0 
Concavity the severity of concave portions 


of the contour 
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Feature Role 
Concave number of concave portions of 
points the contour 
Fractal "coastli imation" - 1 
dimension coastline approximation" - 
Class . : 
distribution 357 benign, 212 malignant 
B. MODEL SELECTION 


The learning process in machine learning 
strategies can be divided into two main 
categories: supervised learning and 
unsupervised learning. The following 
Machine learning models were used in this 
study, 


Support Vector Machine (SVM) 
Multilayer Perceptron (MLP) 
Decision Tree (DT) 

Random Forest (RF) 

K-Nearest Neighbor (KNN) 
Logistic Regression (LR) 


Different models were used to verify the 
best result for the dataset. The 
aforementioned methods were used in the 
research because they provided a better 
classification and interpretation. 
Moreover, a convolutional neural network 
in the future can be used to compare this 
work with a deep learning model. The 
methods, which were employed in this 
study are widely used in classification 
problems, which gave clear directions to 
conduct better analysis. Moreover, a 
generic flow diagram for better 
interpretation was used which is given 
below: 
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Training 
— 


Testing 


—> (fl 


Validation] Testing 


FIGURE 1. Proposed Framework 
C. TRAINING AND TESTING PHASE 


Primarily, the dataset was set into two 
chunks for training and testing. In the 
training phase, main features were 
extracted to classify the cancer type, 
whereas in the testing phase, the 
significance of the predicted case was 
checked in the confusion matrix. K fold 
cross-validation showed that one fold was 
used for testing, while k 1 folds were used 
repeatedly for training. Overfitting was 
prevented by using cross-validation. Data 
was partitioned in the study by using a ten- 
fold cross-validation technique. Each 
iteration used nine folds for training and the 
remaining one fold for testing. 


D. SETUP FOR EXPERIMENTATION 


Six machine learning approaches were 
utilizedin this study, which include SVM, 
KNN, DT, RFs, MLP, and LR. The 
predictions encompassed the benign or 
cancerous nature of cells. Intel(R) Core 
(TM) i3-1111G4 CPU @ 3.00GHz 
2995Mhz with 8 GB RAM were installed 
and used in this study. An open-source 
written library that was written in Python, 
namely Scikit-learn was employed for the 
analysis. Reports, which contained narrated 
text, live code, equations, and graphics 
were created by using an open-source tool, 
Colab. 


Features Extraction 
— 


Model 


Selection 


vta 


Model Classification 


— 


Evaluation 


Many performance metrics were employed 
to measure the effectiveness of Machine 
Learning Algorithms (MLAs). In case of 
the evaluation of the concerned parameters, 
confusion matrices were used such as TP, 
TN, FN, and FP, which were used to predict 
data as well as the real data. For all the 


methods employed, the calculated 
confusion matrix is as follows: 
TABLE MI 
SUPPORT VECTOR MACHINE (SVM) 
CONFUSION MATRIX 
Negative Positive 
Negative TN=84 FP=33 
Positive FN=13 TP=78 


In the case of SVM, TN, FN, FP, and TP 


are 84, 13, 33, and 78, respectively, 
whereas for KNN these are 81,16, 24, and 
87. 


TABLE IV 
K-NEAREST NEIGHBORS (KNN) 
CONFUSION MATRIX 

Negative Positive 
Negative TN=81 FP=24 
Positive FN=16 TP=87 


For the Decision Tree (DT), TN, FN, FP, 
and TP are calculated as 72, 25, 34, and 77, 
respectively. 


Innovative Computing Review 


Volume 3 Issue 1, Spring 2023 


Rana et al. 


TABLE V 
DECISION TREE (DT) CONFUSION 
MATRIX 
Negative Positive 
Negative TN=72 FP=34 
Positive FN=25 TP=77 


For the Multilevel Perceptron, TN, FN, FP, 
and TP are calculated as 20, 09, 21, and 88, 
respectively. 


Table VI 
Multilevel Perceptron (MP) Confusion 
Matrix 
Negative Positive 
Negative TN=20 FP=21 
Positive FN=09 TP=88 


For the Linear Regression, TN, FN, FP, and 
TP are calculated as 88, 11, 20, and 89, 
respectively. 
TABLE VII 
LINEAR REGRESSION (LR) 
CONFUSION MATRIX 


Negative Positive 
TN=88 FP=20 
Positive FN=11 TP=89 

For the Random Forest (RF), TN, FN, FP, 
and TP are calculated as 80, 19, 21, and 88. 
respectively. 


Negative 


TABLE VIII 
RANDOM FOREST (RF) CONFUSION 
MATRIX 
Negative Positive 
Negative TN=80 FP=21 
Positive FN=19 TP=88 
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IV. RESULTS 


The study showed an accuracy of 86%, 
which was the highest accuracy achieved 
through MLP and the second highest 
accuracy achieved was 85.2%, which 
obtained through Logistics Regression. 
Additionally, the highest precision value 
(0) was achieved by MLP and Logistic 
Repression in which MLP achieved 81% of 
accuracy in case 1. The least value of 
precision was 75%, which was achieved by 
SVM for malignant or benign cases. LR 
provided the highest recall rate (0), whereas 
the lowermost values were obtained 
through the Decision Tree (DT) and SVM. 
MLP provided the highest recall rate (0) 
and the least value was given by the 
Decision Tree (DT) and SVM. 


In terms of the F1 score rate, the highest 
value (0) was achieved by MLP,LR, and the 
Decision Tree (DT), whereasSVM 
provided the lowest value. The highest 
recall rate (1) was achieved through MLP, 
whereas SVM provided the lowest value. 
TABLE Ix 
ACCURACY ATTAINED BY MODELS 


Model Accuracy(%) 
SVM 78 
KNN 81 
DECISION TREE 72 
LR 85 
RANDOM FOREST 81 
MLP 86 


After the accuracy, it is significant to know 
the precision, recall, and F1 score to check 
the significance of the obtained results. 
Hence, these results are discussed in the 
discussion section for further interpretation. 
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FIGURE 2. Comparison of models 


The MLP model provided the highest 
accuracy percentage when compared toall 
other techniques in association with the 
accuracy in Figure 1. We are going to 
discuss other things in the discussion 
section below. 


V. DISCUSSION AND CONCLUSION 
To evaluate the specific terms by their 
equivalent formula in the investigation, 
following factors have been abundantly 
used. There are several characteristics, 
which are comparable to those that tend to 
describe the associations as substantial to 
amply measure a system's performance. 
The experimental results are given in Table 
vI. 


TABLE X 

MACHINE LEARNING AND DEEP LEARNING MODELS RESULTS 

Accuracy Precision(0) Precision(1) Recall(0) Recall(1)  Flscore(0) Flscore(1) 
SVM 78% 75% 68% 69% 74% 72% 71% 
KNN 81% 84% 71% 78% 84% 81% 80% 
DT 73% 75% 68% 69% 74% 72% 71% 
LR 85% 89% 81% 82% 89% 85% 85% 
RFT 81% 82% 79% 81% 81% 81% 80% 
MLP 86% 91% 81% 81% 91% 85% 86% 


For technique, the confusion matrix was 
calculated. For the dataset with 831 
instances, 75% of the data instances were 
employed for training models and for 


testing, whereas 25% of the data was used. 
The confusion matrix of these machine 
learning algorithms was shown, which 
provided the results for SVM, DT RF, K- 
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NN, MLP, and LR, respectively. The 
confusion matrix was seen in its 
combination in the table above. 


However, It can forecast the greatest 
number of positives when any of the six 
strategies is noticed to be true. Logistics 
Regression Models (LRM) may forecast 
the least amount of positives when they are 
false positives in addition to predicting the 
greatest number of true positives. Logistic 
Regression (LR) predicted the lowest false- 
positive rate, whereas the false-positives 
highest value was achieved by LR and 
MLP. 


The decision Tree (DT) provided the 
highest rate where false negatives were 
concerned, with the lowest rate being 
achieved by MLP. The F1 score for all the 
techniques is almost 97%, which was 
significantly better. LR predicted the 
highest value of True-Negative, whereas 
MLP provided the lowest value. 
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